Skip to content

Conversation

@pintaoz-aws
Copy link
Contributor

@pintaoz-aws pintaoz-aws commented Feb 14, 2025

Issue #, if available:
#2630
Description of changes:
Add framework_version to all TensorFlowModel examples
Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)
  • If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@pintaoz-aws pintaoz-aws requested a review from a team as a code owner February 14, 2025 00:04
from sagemaker.tensorflow import TensorFlowModel
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole')
model = TensorFlowModel(model_data='s3://mybucket/model.tar.gz', role='MySageMakerRole', framework_version='MyFrameworkVersion')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is framework_version expect an int? can we do a dummy int 0.0.0 or x.x.x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to x.x.x

@pintaoz-aws pintaoz-aws merged commit 5682c42 into aws:master Feb 17, 2025
12 of 14 checks passed
evakravi pushed a commit to evakravi/sagemaker-python-sdk that referenced this pull request Mar 20, 2025
* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>
pravali96 pushed a commit to pravali96/sagemaker-python-sdk that referenced this pull request Apr 21, 2025
* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>
uyoldas pushed a commit to uyoldas/sagemaker-python-sdk that referenced this pull request May 23, 2025
* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>
sage-maker added a commit that referenced this pull request Jul 1, 2025
* feature: integrate amtviz for visualization of tuning jobs

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers (#5037)

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserializers

* fix codestyle

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* Add framework_version to all TensorFlowModel examples (#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (#5045)

* fix: pass in inference_ami_version to model_based endpoint type (#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* fix: codestyle, type hints, license, and docstrings

* documentation: add docstring for amtviz module

* fix: fix docstyle and flake8 errors

* fix: code reformat using black

---------

Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>
HollowTube pushed a commit to HollowTube/sagemaker-python-sdk that referenced this pull request Jul 15, 2025
* feature: integrate amtviz for visualization of tuning jobs

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers (aws#5037)

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserializers

* fix codestyle

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* Add framework_version to all TensorFlowModel examples (aws#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (aws#5045)

* fix: pass in inference_ami_version to model_based endpoint type (aws#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (aws#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* fix: codestyle, type hints, license, and docstrings

* documentation: add docstring for amtviz module

* fix: fix docstyle and flake8 errors

* fix: code reformat using black

---------

Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>
mohamedzeidan2021 added a commit that referenced this pull request Dec 3, 2025
* Add framework_version to all TensorFlowModel examples (#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (#5045)

* fix: pass in inference_ami_version to model_based endpoint type (#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* change: added ap-southeast-7 and mx-central-1 for Jumpstart (#5049)

* added ap-southeast-7 and mx-central-1 for Jumpstart

* added BKK dlc to djl-neuronx

---------

Co-authored-by: Isha Chidrawar <[email protected]>

* prepare release v2.239.3

* update development version to v2.239.4.dev0

* change: update image_uri_configs  02-20-2025 06:18:08 PST

* feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050)

Co-authored-by: malavhs <[email protected]>

* Add backward compatbility for RecordSerializer and RecordDeserializer (#5052)

* Add backward compatbility for RecordSerializer and RecordDeserializer

* fix circular import

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* py_version doc fixes (#5048)

* change: update image_uri_configs  02-21-2025 06:18:10 PST

* fix: altconfig hubcontent and reenable integ test (#5051)

* fix altconfig hubcontent and reenable integ test

* linting

* update exception thrown

* feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050)

Co-authored-by: malavhs <[email protected]>

* add test

* update predictor spec accessor

* lint

* set custom field from HCD config to model spec data class

* lint

* remove logs

* last update

---------

Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: malavhs <[email protected]>

* fix: forbid extras in Configs (#5042)

* fix: make configs safer

* fix: safer destructor in ModelTrainer

* format

* Update error message

* pylint

* Create BaseConfig

* Remove main function entrypoint in ModelBuilder dependency manager. (#5058)

* Remove main function entrypoint in ModelBuilder dependency manager.

* Remove main function entrypoint in ModelBuilder dependency manager.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* documentation: Removed a line about python version requirements of training script which can misguide users. (#5057)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* prepare release v2.240.0

* update development version to v2.240.1.dev0

* Fix key error in _send_metrics() (#5068)

Co-authored-by: pintaoz <[email protected]>

* fix: Added check for the presence of model package group before creating one (#5063)

Co-authored-by: Keshav Chandak <[email protected]>

* Use sagemaker session's s3_resource in download_folder (#5064)

Co-authored-by: pintaoz <[email protected]>

* Fix error when there is no session to call _create_model_request() (#5062)

* Fix error when there is no session to call _create_model_request()

* Fix codestyle

---------

Co-authored-by: pintaoz <[email protected]>

* Ensure Model.is_repack() returns a boolean (#5060)

* Ensure Model.is_repack() returns a boolean

* update test

---------

Co-authored-by: pintaoz <[email protected]>

* feat: Allow ModelTrainer to accept hyperparameters file (#5059)

* Allow ModelTrainer to accept hyperparameter file and create Hyperparameter class

* pylint

* Detect hyperparameters from contents rather than file extension

* pylint

* change: add integs

* change: add integs

* change: remove custom hyperparameter tooling

* Add tests for hp contracts

* change: add unit tests and remove unreachable condition

* fix integs

* doc check fix

* fix tests

* fix tox.ini

* add unit test

* feature: support training for JumpStart model references as part of Curated Hub Phase 2 (#5070)

* change: update image_uri_configs  01-27-2025 06:18:13 PST

* fix: skip TF tests for unsupported versions (#5007)

* fix: skip TF tests for unsupported versions

* flake8

* change: update image_uri_configs  01-29-2025 06:18:08 PST

* chore: add new images for HF TGI (#5005)

* feat: add pytorch-tgi-inference 2.4.0

* add tgi 3.0.1 image

* skip faulty test

* formatting

* formatting

* add hf pytorch training 4.46

* update version alias

* add py311 to training version

* update tests with pyversion 311

* formatting

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* feat: use jumpstart deployment config image as default optimization image (#4992)

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.238.0

* update development version to v2.238.1.dev0

* Fix ssh host policy (#4966)

* Fix ssh host policy

* Filter policy by algo-

* Add docstring

* Fix pylint

* Fix docstyle summary

* Unit test

* Fix unit test

* Change to unit test

* Fix unit tests

* Test comment out flaky tests

* Readd the flaky tests

* Remove flaky asserts

* Remove flaky asserts

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* change: Allow telemetry only in supported regions (#5009)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* mpirun protocol - distributed training with @remote decorator (#4998)

* implemented multi-node distribution with @remote function

* completed unit tests

* added distributed training with CPU and torchrun

* backwards compatibility nproc_per_node

* fixing code: permissions for non-root users, integration tests

* fixed docstyle

* refactor nproc_per_node for backwards compatibility

* refactor nproc_per_node for backwards compatibility

* pylint fix, newlines

* added unit tests for bootstrap_environment remote

* added  mpirun protocol for distributed training with @remote decorator

* aligned mpi_utils_remote.py to mpi_utils.py for estimator

* updated docstring for sagemaker sdk doc

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* feat: Add support for deepseek recipes (#5011)

* feat: Add support for deeepseek recipes

* pylint

* add unit test

* feat: [JumpStart] Add access configs and training instance type variants artifact uri handling for Curated Hub Phase 2 training integration (#1653)

* Add access config to training input for Curated Hub Training Integration

* Add support to retrieve instance specific training artifact keys

* Fix some typos and naming issues

* Fix more typos

* fix formatting issues with black

* modify access config logic so accept_eula is passed into fit

* update black formatting

* Add more unit tests for passing access configs

* fix style errors

* fix for failing integ test

* fix styles and integ test error

* skip blocking integ test

* fix formatting

* remove env vars when access configs are being used

* fix docstyle issue

* update usage of access configs, remove conversion of training artifact key to uri

* fix styling issues

* fix styling issues

* fix unit tests

* fix adding hubaccessconfig only if hubcontentarn exists

* move logic to JumpStartEstimator from Job

* Fix styling issues

* Remove unused code

* fix styling issues

* fix unit test failure

* fix some formatting, add comments

* remove typing for estimator in get_access_configs function

* fix circular import dependency

* fix styling issues

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Always add code channel, regardless of network isolation (#1657)

* fix formatting issue

* fix formatting issue

* fix formatting issue

* fix tensorflow file

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: varunmoris <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: rsareddy0329 <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>

* feat: Make DistributedConfig Extensible (#5039)

* feat: Make DistributedConfig Extensible

* pylint

* Include none types when creating config jsons for safer reference

* fix: update test to account for changes

* format

* Add integ test

* pylint

* prepare release v2.240.0

* update development version to v2.240.1.dev0

* Fix key error in _send_metrics() (#5068)

Co-authored-by: pintaoz <[email protected]>

* fix: Added check for the presence of model package group before creating one (#5063)

Co-authored-by: Keshav Chandak <[email protected]>

* Use sagemaker session's s3_resource in download_folder (#5064)

Co-authored-by: pintaoz <[email protected]>

* remove union

* fix merge artifact

* Change dir path to distributed_drivers

* update paths

---------

Co-authored-by: ci <ci>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>

* Skip tests with deprecated instance type (#5077)

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.241.0

* update development version to v2.241.1.dev0

* pipeline definition function doc update (#5074)

Co-authored-by: Rohan Gujarathi <[email protected]>

* feat: add integ tests for training JumpStart models in private hub (#5076)

* feat: add integ tests for training JumpStart models in private hub

* fixed formatting

* remove unused imports

* fix unused imports

* fix unit test failure and fix bug around versioning

* fix formatting

* fix unit tests

* fix model_uri usage issue

* fix some formatting

* separate private hub setup code

* add try catch block

* fix flake8 issue so except clause is not bare

* black formatting

* fix: resolve infinite loop in _find_config on Windows systems (#4970)

* fix: resolve Windows path handling in _find_config

* Replace Path.match("/") with Path.anchor comparison
* Fix infinite loop in _studio.py path traversal

* test: Add tests for the new root path exploration

* Fix formatting style

* Fixed line to long

* Fix docstyle by running black manually

* Fix testcase with \\ when running on non-windows machines

* Fix formatting style

* cleanup unused import

* change: update image_uri_configs  03-11-2025 07:18:09 PST

* Fixing Pytorch training python version in tests (#5084)

* Fixing Pytorch training python version in tests

* Updating Inference test handling

* remove s3 output location requirement from hub class init (#5081)

* remove s3 output location requirement from hub class init

* fix integ test hub

* lint

* fix test

---------

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* fix: Prevent RunContext overlap between test_run tests (#5083)

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Torch upgrade (#5086)

* Fix Flake8 Violations

* UPDATE PYTORCH VERSION TO ADDRESS SECURITY RISK

**Description**

Currently used Pytorch version has a possible vulnerability .

Internal - https://tiny.amazon.com/p5i4jla1

**Testing Done**

Unit and Integration tests in the CodeBuild

* REvert CPU Versions

* Test Fix

* Codestyle fixes

* debug attempt

* Fixes

* Fix

* Fix

* prepare release v2.242.0

* update development version to v2.242.1.dev0

* add new regions to JUMPSTART_LAUNCHED_REGIONS (#5089)

Co-authored-by: isha chidrawar <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* ADD Documentation to ReadtheDocs for Upgrading torch versions (#5090)

* ADD Documentation to ReadtheDocs for Upgrading torch versions

**Description**

**Testing Done**
Only documentation updates

* Fix for Codestyle

* Remove unused import

* Flake8 Fix

* CodeStyle Fixes

* feature: Enabled update_endpoint through model_builder (#5085)

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: factor in set instance type when building JumpStart models in ModelBuilder. (#5093)

* Remove main function entrypoint in ModelBuilder dependency manager.

* Remove main function entrypoint in ModelBuilder dependency manager.

* fix: factor in set instance type when building JumpStart models in ModelBuilder.

* Remove default instance type from ModelBuilder.

* Restore default instance type. Tweak integ test.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* change: update image_uri_configs  03-21-2025 07:17:55 PST

* Skip tests failed due to deprecated instance type (#5097)

Co-authored-by: pintaoz <[email protected]>

* Feat: Added support for returing most recently created approved model package in a group (#5092)

Co-authored-by: Keshav Chandak <[email protected]>

* change: update image_uri_configs  03-25-2025 07:18:13 PST

* chore: fix integ tests to use latest version of model (#5104)

* change: update image_uri_configs  03-26-2025 07:18:16 PST

* Update Jinja version (#5101)

* Aligned disable_output_compression for @remote with Estimator (#5094)

* Update transformers version (#5102)

* fix: use temp file in unit tests (#5106)

* fix: fix flaky spark processor integ (#5109)

* fix: fix flaky spark processor integ

* format

* fix: fix flaky clarify model monitor test (#5107)

* chore: move jumpstart region definitions to json file (#5095)

* chore: move jumpstart region definitions to json file

* chore: address formatting issues

* fix: neo regions not ga in 5 regions

* chore: make variable private

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* change: Update for PT 2.5.1, SMP 2.8.0 (#5071)

* prepare release v2.243.0

* update development version to v2.243.1.dev0

* fix: flaky test (#5111)

* chore: fix semantic versioning for wildcard identifier (#5105)

* Add mlflow tracking arn telemetry (#5113)

Integ test failure is align with CI health

* Master (#5112)

* fix integ test hub

* lint

* fix jumpstart curated hub bugs

* lint

* fix tests

* linting

* lint

* rm test file

* fix test

* fix

* lint

* remove test

* update for test

* documentation: update ModelStep data dependency info (#5120)

Co-authored-by: Namrata Madan <[email protected]>

* Update instance gpu info (#5119)

* fix: remove historical job_name caching which causes long job name (#5118)

* Fix issue #4856 by copying environment variables (#5115)

* Fix issue #4856 by copying environment variables

* Added handler for pipeline variable while creating process job (#5122)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* documentation: update pipelines step caching examples to include more steps (#5121)

Co-authored-by: Brock Wade <[email protected]>

* prepare release v2.243.1

* update development version to v2.243.2.dev0

* Fix deepdiff dependencies (#5128)

* Fix deepdiff dependencies

* trigger tests

* Fix: fix the issue due to PR changes, 5122 (#5124)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: tgi image uri unit tests (#5127)

* fix: tgi image uri unit tests

* fix: black-format and flake8 failures

* fix: parse

* fix: print statement

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.243.2

* update development version to v2.243.3.dev0

* change: update image_uri_configs 04-11-2025 07:18:19 PST

* change: update image_uri_configs 04-15-2025 07:18:10 PST

* change: update image_uri_configs 04-16-2025 07:18:18 PST

* update pr test to deprecate py38 and add py312 (#5133)

* Py312 upgrade step 2: Update dependencies, integ tests and unit tests (#5123)

* clean up

* bump maxdepth for doc/api/training to fix readthedocs

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* Revert the PR changes 5122 (#5134)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

* Revert PR 5122 changes, due to issues with other processor codeflows

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>

* update readme to reflect py312 upgrade

* prepare release v2.243.3

* update development version to v2.243.4.dev0

* chore: add huggingface images (#5142)

* Update ModelTrainer to support s3 uri and tar.gz file as source_dir (#5144)

* add s3 uri check to modeltrainer data source

* update ModelTrainer to support s3 uri and tar.gz file as source_dir

* black-format

* add unit and integ tests

* update logic and unit test to raise value error if the file is not .tar.gz

* feature:support custom workflow deployment in ModelBuilder using SMD image. (#5143)

* feature:support custom workflow deployment in ModelBuilder using SMD image. (#1661)

* feature:support custom workflow deployment in ModelBuilder using SMD inference image.

* Rename test case and pass session.

* Address PR comments.

* Tweak resource cleanup logic in integ test.

* Fixing CodeBuild integ test failures.

* Renamed integ test.

* Remove unused integ test, restore once GA.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Cache client as instance attribute in property@ decorator. (#1668)

* Remove property@ decorator from ABC definition.

* Cache client as instance attribute in @property.

* Fix flake8 issue.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Bugfixes from e2e testing. (#1670)

* Fix Alabtross Inference component tests

* trigger integ tests

---------

Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>

* fix: pin mamba version to 24.11.3-2 to avoid inconsistent test runs (#5149)

Co-authored-by: Namrata Madan <[email protected]>

* Add model server timeout (#5151)

Co-authored-by: adishaa <[email protected]>

* Add Owner ID check for bucket with path when prefix is provided (#5146)

* Fix Flake8 Violations

* Add Owner ID check for bucket with path when prefix is provided

**Description**

Previously we called the head_bucket call to ensure the owner ID check, but this doesnt take into consideration cases where the s3 path is provided through the prefix.

This change makes sure that director level permissions are supported.

**Testing Done**
Tested through unit tests, integ tests and manual testing through the installation file.

Yes

* Address PR comment

* Codestyle fixes

* Minor fix

* Codestyle fixes

* Fix Unit tests

* prepare release v2.244.0

* update development version to v2.244.1.dev0

* chore: Add tei 1.6.0 image (#5145)

* chore: add huggingface images

* chore: add tei 1.6 image

* chore: add tei 1.6.0 to tei mapping in tests

* build(deps): bump mlflow in /tests/data/serve_resources/mlflow/pytorch (#5098)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump mlflow (#5155)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-version: 2.20.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump scikit-learn (#5156)

Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.3.2 to 1.5.1.
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](https://github.com/scikit-learn/scikit-learn/compare/1.3.2...1.5.1)

---
updated-dependencies:
- dependency-name: scikit-learn
  dependency-version: 1.5.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improve error logging and documentation for issue 4007 (#5153)

* Improve error logging and documentation for issue 4007

* Add hyperlink to RTDs

* fix: fix bad initialization script error message (#5152)

Co-authored-by: Namrata Madan <[email protected]>

* fix: pin test dependency (#5165)

* fix: Map llama models to correct script (#5159)

* fix: honor json serialization of HPs (#5164)

* fix: honor json serialization of HPs

* test

* fix

* chore: Allow omegaconf >=2.2,<3 (#5168)

* Fix type annotations (#5166)

* remove --strip-component for untar source tar.gz (#5163)

* remove --strip-component for untar source tar.gz

* update code.tar.gz in test

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* fix: parameter mismatch in update_endpoint (#5135)

* add AG v1.3 (#5171)

Co-authored-by: Ubuntu <[email protected]>

* Fix test_deploy_with_update_endpoint() (#5177)

Co-authored-by: pintaoz <[email protected]>

* huggingface-tei dlc image_uri (#5174)

Co-authored-by: pintaoz-aws <[email protected]>

* huggingface-neuronx dlc image_uri (#5172)

* huggingface-neuronx dlc image_uri

* huggingface-neuronx inference dlc

---------

Co-authored-by: pintaoz-aws <[email protected]>

* huggingface-llm-neuronx dlc (#5173)

Co-authored-by: pintaoz-aws <[email protected]>

* Fix test_huggingface_tei_uris() (#5178)

* Fix test_huggingface_tei_uris()

* Fix json

---------

Co-authored-by: pintaoz <[email protected]>

* Fix Flask-Limiter version (#5180)

* prepare release v2.244.1

* update development version to v2.244.2.dev0

* change: Improve defaults handling in ModelTrainer (#5170)

* Improve default handling

* format

* add tests & update docs

* fix docstyle

* fix input_data_config

* fix use input_data_config parameter in train as authoritative source

* fix tests

* format

* update checkpoint config

* docstyle

* make config creation backwards compatible

* format

* fix condition

* fix Compute and Networking config when attributes are None

* format

* fix

* format

* change: Add image configs and region config for TPE (ap-east-2) (#5167)

* add image configs and region config for TPE (ap-east-2)

* remove TPE from djl-neuronx

---------

Co-authored-by: isha chidrawar <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>

* change: update image_uri_configs 05-14-2025 07:18:16 PST

* change: update jumpstart region_config 05-15-2025 07:18:15 PST

* fix: clarify model monitor one time schedule bug (#5169)

* fix: include model channel for gated uncompressed models (#5181)

* prepare release v2.244.2

* update development version to v2.244.3.dev0

* change: update image_uri_configs 05-20-2025 07:18:17 PST

* feat: Correct mypy type checking through PEP 561 (#5027)

Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Molly He <[email protected]>

* change: merge method inputs with class inputs (#5183)

* fix: addWaiterTimeoutHandling (#4951)

* addWaiterTimeoutHandling

* codeStyleUpdate

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

---------

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Ubuntu <[email protected]>

* MLFLow update for dependabot (#5187)

* MLFLow update for dependabot

* Update lower bound

* Unit test fixes

* prepare release v2.245.0

* update development version to v2.245.1.dev0

* feature: Triton v25.04 DLC (#5188)

Co-authored-by: Mohan Kishore <[email protected]>

* update estimator documentation regarding hyperparameters for source_dir (#5190)

* Update Attrs version to widen support (#5185)

* Update Attrs version to widen support

**Description**

https://github.com/aws/sagemaker-python-sdk/issues/5075

**Testing Done**
Running unit and integ tests

Unit and integ tests passing indicate that this version upgrade does not break anything

* Update version in conda_in_process.yml

* Update test requirements

* MLFlow update version

---
Tested by : Running unit and integ tests

* prepare release v2.246.0

* update development version to v2.246.1.dev0

* fix: Allow import failure for internal _hashlib module (#5192)

* fix: Allow import failure for _hashlib module

* Fix formatting

* Appease flake8

* Add ignore_patterns in ModelTrainer to ignore specific files/folders (#5194)

* Add ignore_patterns in ModelTrainer to ignore specific files/folders

* fix black format

* add unit test

* add default ignore_patterns, fix minor path issue when uploaded to s3

* minor change to fix unit test failure

* add new variables in default ignore_patterns

* fix indentation error in docstring for readthedocs

* Fix: Object of type ModelLifeCycle is not JSON serializable (#5197)

* Fix: Object of type ModelLifeCycle is not JSON serializable

* Fix unit test

* Fix integ tests

* Revert "Fix integ tests"

This reverts commit f6513fe430d7f7f13486239aaaf6983efde2e00f.

* Fix integration tests

---------

Co-authored-by: adishaa <[email protected]>

* change: update jumpstart region_config, update image_uri_configs 06-12-2025 07:18:12 PST

* feat: Add support for MetricDefinitions in ModelTrainer (#5202)

* feat: Add support for MetricDefinitions in ModelTrainer

* style fix

* Update model_trainer.py to generate the doc

* resolve unit test failed

* solve another unit test error

---------

Co-authored-by: Chad Chiang <[email protected]>

* prepare release v2.247.0

* update development version to v2.247.1.dev0

* change: update image_uri_configs 06-19-2025 07:18:34 PST

* prepare release v2.247.1

* update development version to v2.247.2.dev0

* change: relax protobuf to <6.32 (#5211)

* change: update image_uri_configs 06-26-2025 07:18:35 PST

* feature: integrate amtviz for visualization of tuning jobs (#5044)

* feature: integrate amtviz for visualization of tuning jobs

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers (#5037)

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserializers

* fix codestyle

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* Add framework_version to all TensorFlowModel examples (#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (#5045)

* fix: pass in inference_ami_version to model_based endpoint type (#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* fix: codestyle, type hints, license, and docstrings

* documentation: add docstring for amtviz module

* fix: fix docstyle and flake8 errors

* fix: code reformat using black

---------

Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>

* change: update image_uri_configs 07-04-2025 07:18:27 PST

* Update TF DLC python version to py312 (#5231)

* Update TF DLC python version to py312

* catch integ version

* Bump SMD version to enable custom workflow deployment. (#5230)

* Bump SMD version to enable custom workflow deployment.

* Update SMD image uri UT.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Adding Hyperpod feature to enable hyperpod telemetry

* Adding Hyperpod feature to enable hyperpod telemetry (#5235)

* Adding Hyperpod feature to enable hyperpod telemetry

* Adding Hyperpod feature to enable hyperpod telemetry

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: sanitize git clone repo input url (#5234)

* build(deps): bump torch in /tests/data/modules/script_mode (#5189)

Bumps [torch](https://github.com/pytorch/pytorch) from 2.0.1+cpu to 2.7.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.7.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump mlflow in /tests/data/serve_resources/mlflow/xgboost (#5218)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 3.1.0.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v3.1.0)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-version: 3.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump protobuf from 4.25.5 to 4.25.8 in /requirements/extras (#5209)

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 4.25.5 to 4.25.8.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v4.25.5...v4.25.8)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 4.25.8
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump requests in /tests/data/serve_resources/mlflow/pytorch (#5200)

Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* prepare release v2.248.0

* update development version to v2.248.1.dev0

* Nova training support (#5238)

* feature: Added Amazon Nova training support for ModelTrainer and Estimator

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.248.1

* update development version to v2.248.2.dev0

* change: When rootlessDocker is enabled, return a fixed SageMaker IP (#5236)

* change: When rootlessDocker is enabled, return a fixed SageMaker IP

* Add logging for docker info command failure

---------

Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* fix: add hard dependency on sagemaker-core pypi lib (#5241)

* change: update image_uri_configs 07-18-2025 07:18:28 PST

* change: update image_uri_configs 07-22-2025 07:18:25 PST

* Relax boto3 version requirement (#5245)

* prepare release v2.248.2

* update development version to v2.248.3.dev0

* change: update image_uri_configs 07-23-2025 07:18:25 PST

* Directly use customer-provided endpoint name for ModelBuilder deployment. (#5246)

* Directly use customer-provided endpoint name for deployment in ModelBuilder.

* Fix ModelBuilder UTs after removing unique_name_from_base import.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* feature: AWS Batch for SageMaker Training jobs (#5249)

---------

Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: haoxinwa <[email protected]>
Co-authored-by: JennaZhao <[email protected]>
Co-authored-by: Jessica Zhu <[email protected]>
Co-authored-by: David Lindskog <[email protected]>

* prepare release v2.249.0

* update development version to v2.249.1.dev0

* Add more constraints to test requirements (#5254)

* Add constraint file to test requirements

* Add constraints

---------

Co-authored-by: pintaoz <[email protected]>

* feature: Add support for InstancePlacementConfig in Estimator for training jobs running on ultraserver capacity (#5259)

---------

Co-authored-by: Greg Katkov <[email protected]>

* prepare release v2.250.0

* update development version to v2.250.1.dev0

* feat: support pipeline versioning (#5248)

Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* add sleep for model deployment (#5260)

* fix: dockerfile stuck on interactive shell (#5261)

* GPT OSS Hotfix (#5263)

* changes for gpt_oss jobs support

* added unit tests

* fixing unit test

* prepare release v2.251.0

* update development version to v2.251.1.dev0

* chore: onboard tei 1.8.0 (#5265)

* chore: onboard tei 1.8.0

* chore: fix tei tests

* feature: Greenland support for SagemakerTraining jobs (#1737)

* feature: Greenland support for SagemakerTraining jobs

* Added docstrings, tests to show sample filters for list_jobs

* Fixed linting error

* Addressed comments on rev1

* Removed unused attributes: retry_config and role_arn

* Fixes identified during end to end test

* Removed unused import

* prepare release v2.251.1

* update development version to v2.251.2.dev0

* latest tgi (#5255)

* latest tgi

* add optimum-neuron tgi

---------

Co-authored-by: sage-maker <[email protected]>

* Feature/js mlops telemetry (#5268)

* removed log statement

* added telemetry for js and mlops

* added for js estimator

* fixed unit tests

---------

Co-authored-by: Mohamed Zeidan <[email protected]>

* feature: add eval custom lambda arn to hyperparameters (#5272)

* fix: add retryable option to emr step in SageMaker Pipelines (#5281)

* Add nova custom lambda in hyperparameter from estimator (#5282)

* Add nova custom lambda in hyperparameter from estimator

* Add nova custom lambda in hyperparameter from estimator

* feat: change S3 endpoint env name (#5264)

* fix: handle trial component status message longer than API supports (#5276)

* merge rba without the iso region changes (#5290)

* change: update image_uri_configs 08-28-2025 07:18:37 PST

* change: update image_uri_configs 09-03-2025 07:18:37 PST

* change: update image_uri_configs 09-05-2025 07:18:30 PST

* change: update jumpstart region_config 09-17-2025 07:18:39 PST

* Revert "change: update image_uri_configs 08-28-2025 07:18:37 PST"

This reverts commit 96ea39db00c36050cc5478bd13f14e8c5f9347db.

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>

* Remove tags field from greenland job submitter (#1738)

* Remove tags field from greenland job submitter

* Update tests

* allow is_production to be passed in

* Add response in message

---------

Co-authored-by: JieShen Ong <[email protected]>

* prepare release v2.252.0

* update development version to v2.252.1.dev0

* feature: add model_type hyperparameter support for Nova recipes (#5291)

Co-authored-by: xibei chen <[email protected]>

* Fix flaky integ test (#5294)

Co-authored-by: pintaoz <[email protected]>

* fix: djl regions fixes #5273 (#5277)

* test: adds unit test for djl lmi regions

* test: adds regions in which djl images do not exist

* fix: adds djl missing regions

* fix: linting

* docs: update contributing to add linting section

---------

Co-authored-by: pintaoz-aws <[email protected]>

* Adding default identity implementations to InferenceSpec (#5278)

Co-authored-by: pintaoz-aws <[email protected]>

* feature: Added condition to allow eval recipe. (#5298)

* feature: Added condition to allow eval recipe.

* change: renamed is_nova_recipe to is_nova_or_eval_recipe

* chore: domain support for eu-isoe-west-1 (#5292)

* Add numpy 2.0 support (#5199)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Fix for a failed slow test: numpy fix (#5304)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Revert "Merge branch 'master-greenland' into master"

This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing
changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38.

* prepare release v2.253.0

* update development version to v2.253.1.dev0

* add TEI 1.8.2 (#5305)

* add TEI 1.8.2

* add test

* [hf-tei] add image uri to utils (#5287)

* tei

* tests

---------

Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Molly He <[email protected]>

* Revert the change "Add Numpy 2.0 support" (#5307)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Update instance type regex to also include hyphens (#5308)

* Revert "Merge branch 'master-greenland' into master" (#1747)

This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing
changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38.

* prepare release v2.253.1

* update development version to v2.253.2.dev0

* [hf] HF Inference TGI  (#5302)

* image

* tests

---------

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* [Hugging Face][Pytorch] Inference DLC 4.51.3 (#5271)

* new image

* Update src/sagemaker/image_uri_config/huggingface.json

removed missing CPU image

* add cpu back

---------

Co-authored-by: Molly He <[email protected]>

* add HF Optimum Neuron DLCs (#5309)

* add image

* inf on dlc

* neuron tgi dlcs

* fix test

---------

Co-authored-by: Zhaoqi <[email protected]>

* feat: Triton v25.09 DLC (#5314)

* Add Numpy 2.0 support (#5311)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* prepare release v2.254.0

* update development version to v2.254.1.dev0

* [hf] HF PT Training DLCs (#5301)

* image

* add py312

* fix

* test fix

* typo

---------

Co-authored-by: Molly He <[email protected]>

* fix: update get_execution_role to directly return the ExecutionRoleArn if it presents in the resource metadata file (#5315)

Co-authored-by: Jun Lyu <[email protected]>

* Updating RegisterModel step with new params (#1766)

* Adding Model package registration field for registermodel step

* Adding base model for register model step

* Fixes for baseModel

* Fix unit tests

* Adding tests for model, fixing checkstyle

* Modifying BaseModel to ContainerBaseModel

* Fixes

* Fixing checkstyle

* fixing imports

* fix for unit test

* Fix unit tests

* Keynote3 kandinsky (#1833)

* feature: Add support for SFT recipes

- Add unit tests

* Rename model_type and recipe

* Feat: Add support for LLMFT in ModelTrainer and add unit tests

* Upload verl recipe to S3 along with llmft

* Update llmft check to reflect the new recipe structure

---------

Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: appari <[email protected]>

* zimmer merged to master-v2

* numpy fix

* fixed numpy version

* more merge conflict

* more merge conflicts

* estimator fix

* requirements conflict

* pyproj fix

* smcore<2.0.0

* Restrict sagemaker-core version to less than 2.0.0 (#1917)

* malav changes w smcore<2.0.0

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: IshaChid76 <[email protected]>
Co-authored-by: Isha Chidrawar <[email protected]>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: malavhs <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: rsareddy0329 <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Rohan Narayan <[email protected]>
Co-authored-by: varunmoris <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Julian Grimm <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: rrrkharse <[email protected]>
Co-authored-by: evakravi <[email protected]>
Co-authored-by: Victor Zhu <[email protected]>
Co-authored-by: ruiliann666 <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: jkasiraj <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Molly He <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Aditi Sharma <[email protected]>
Co-authored-by: adishaa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman A <[email protected]>
Co-authored-by: David Tippett <[email protected]>
Co-authored-by: Prateek M Desai <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: pagezyhf <[email protected]>
Co-authored-by: zicanl-amazon <[email protected]>
Co-authored-by: DemyCode <[email protected]>
Co-authored-by: haozhx23 <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Will Childs-Klein <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: uyoldas <[email protected]>
Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: Sirut Buasai <[email protected]>
Co-authored-by: Tritin Truong <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: papriwal <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: haoxinwa <[email protected]>
Co-authored-by: JennaZhao <[email protected]>
Co-authored-by: Jessica Zhu <[email protected]>
Co-authored-by: David Lindskog <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: adtian2 <[email protected]>
Co-authored-by: Kamalakannan Hari Krishna Moorthy <[email protected]>
Co-authored-by: Mohamed Zeidan <[email protected]>
Co-authored-by: Tim Tang <[email protected]>
Co-authored-by: Timothy Wu <[email protected]>
Co-authored-by: Cuong Vu <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Jie Shen Ong <[email protected]>
Co-authored-by: JieShen Ong <[email protected]>
Co-authored-by: sylvie7788 <[email protected]>
Co-authored-by: xibei chen <[email protected]>
Co-authored-by: Malte Reimann <[email protected]>
Co-authored-by: aviruthen <[email protected]>
Co-authored-by: chiragvp-aws <[email protected]>
Co-authored-by: Gokul A <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: Andrew Song <[email protected]>
Co-authored-by: JunLyu <[email protected]>
Co-authored-by: Jun Lyu <[email protected]>
Co-authored-by: Madhubalasri-B <[email protected]>
Co-authored-by: CHANG-NING TSAI <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: appari <[email protected]>
mohamedzeidan2021 added a commit that referenced this pull request Dec 3, 2025
* kandinsky, nova, zimmer to Keynote 3 v2 (#1924)

* Add framework_version to all TensorFlowModel examples (#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (#5045)

* fix: pass in inference_ami_version to model_based endpoint type (#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* change: added ap-southeast-7 and mx-central-1 for Jumpstart (#5049)

* added ap-southeast-7 and mx-central-1 for Jumpstart

* added BKK dlc to djl-neuronx

---------

Co-authored-by: Isha Chidrawar <[email protected]>

* prepare release v2.239.3

* update development version to v2.239.4.dev0

* change: update image_uri_configs  02-20-2025 06:18:08 PST

* feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050)

Co-authored-by: malavhs <[email protected]>

* Add backward compatbility for RecordSerializer and RecordDeserializer (#5052)

* Add backward compatbility for RecordSerializer and RecordDeserializer

* fix circular import

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* py_version doc fixes (#5048)

* change: update image_uri_configs  02-21-2025 06:18:10 PST

* fix: altconfig hubcontent and reenable integ test (#5051)

* fix altconfig hubcontent and reenable integ test

* linting

* update exception thrown

* feat: Add support for TGI Neuronx 0.0.27 and HF PT 2.3.0 image in PySDK (#5050)

Co-authored-by: malavhs <[email protected]>

* add test

* update predictor spec accessor

* lint

* set custom field from HCD config to model spec data class

* lint

* remove logs

* last update

---------

Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: malavhs <[email protected]>

* fix: forbid extras in Configs (#5042)

* fix: make configs safer

* fix: safer destructor in ModelTrainer

* format

* Update error message

* pylint

* Create BaseConfig

* Remove main function entrypoint in ModelBuilder dependency manager. (#5058)

* Remove main function entrypoint in ModelBuilder dependency manager.

* Remove main function entrypoint in ModelBuilder dependency manager.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* documentation: Removed a line about python version requirements of training script which can misguide users. (#5057)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* prepare release v2.240.0

* update development version to v2.240.1.dev0

* Fix key error in _send_metrics() (#5068)

Co-authored-by: pintaoz <[email protected]>

* fix: Added check for the presence of model package group before creating one (#5063)

Co-authored-by: Keshav Chandak <[email protected]>

* Use sagemaker session's s3_resource in download_folder (#5064)

Co-authored-by: pintaoz <[email protected]>

* Fix error when there is no session to call _create_model_request() (#5062)

* Fix error when there is no session to call _create_model_request()

* Fix codestyle

---------

Co-authored-by: pintaoz <[email protected]>

* Ensure Model.is_repack() returns a boolean (#5060)

* Ensure Model.is_repack() returns a boolean

* update test

---------

Co-authored-by: pintaoz <[email protected]>

* feat: Allow ModelTrainer to accept hyperparameters file (#5059)

* Allow ModelTrainer to accept hyperparameter file and create Hyperparameter class

* pylint

* Detect hyperparameters from contents rather than file extension

* pylint

* change: add integs

* change: add integs

* change: remove custom hyperparameter tooling

* Add tests for hp contracts

* change: add unit tests and remove unreachable condition

* fix integs

* doc check fix

* fix tests

* fix tox.ini

* add unit test

* feature: support training for JumpStart model references as part of Curated Hub Phase 2 (#5070)

* change: update image_uri_configs  01-27-2025 06:18:13 PST

* fix: skip TF tests for unsupported versions (#5007)

* fix: skip TF tests for unsupported versions

* flake8

* change: update image_uri_configs  01-29-2025 06:18:08 PST

* chore: add new images for HF TGI (#5005)

* feat: add pytorch-tgi-inference 2.4.0

* add tgi 3.0.1 image

* skip faulty test

* formatting

* formatting

* add hf pytorch training 4.46

* update version alias

* add py311 to training version

* update tests with pyversion 311

* formatting

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* feat: use jumpstart deployment config image as default optimization image (#4992)

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.238.0

* update development version to v2.238.1.dev0

* Fix ssh host policy (#4966)

* Fix ssh host policy

* Filter policy by algo-

* Add docstring

* Fix pylint

* Fix docstyle summary

* Unit test

* Fix unit test

* Change to unit test

* Fix unit tests

* Test comment out flaky tests

* Readd the flaky tests

* Remove flaky asserts

* Remove flaky asserts

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* change: Allow telemetry only in supported regions (#5009)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* mpirun protocol - distributed training with @remote decorator (#4998)

* implemented multi-node distribution with @remote function

* completed unit tests

* added distributed training with CPU and torchrun

* backwards compatibility nproc_per_node

* fixing code: permissions for non-root users, integration tests

* fixed docstyle

* refactor nproc_per_node for backwards compatibility

* refactor nproc_per_node for backwards compatibility

* pylint fix, newlines

* added unit tests for bootstrap_environment remote

* added  mpirun protocol for distributed training with @remote decorator

* aligned mpi_utils_remote.py to mpi_utils.py for estimator

* updated docstring for sagemaker sdk doc

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* feat: Add support for deepseek recipes (#5011)

* feat: Add support for deeepseek recipes

* pylint

* add unit test

* feat: [JumpStart] Add access configs and training instance type variants artifact uri handling for Curated Hub Phase 2 training integration (#1653)

* Add access config to training input for Curated Hub Training Integration

* Add support to retrieve instance specific training artifact keys

* Fix some typos and naming issues

* Fix more typos

* fix formatting issues with black

* modify access config logic so accept_eula is passed into fit

* update black formatting

* Add more unit tests for passing access configs

* fix style errors

* fix for failing integ test

* fix styles and integ test error

* skip blocking integ test

* fix formatting

* remove env vars when access configs are being used

* fix docstyle issue

* update usage of access configs, remove conversion of training artifact key to uri

* fix styling issues

* fix styling issues

* fix unit tests

* fix adding hubaccessconfig only if hubcontentarn exists

* move logic to JumpStartEstimator from Job

* Fix styling issues

* Remove unused code

* fix styling issues

* fix unit test failure

* fix some formatting, add comments

* remove typing for estimator in get_access_configs function

* fix circular import dependency

* fix styling issues

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Always add code channel, regardless of network isolation (#1657)

* fix formatting issue

* fix formatting issue

* fix formatting issue

* fix tensorflow file

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: varunmoris <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: rsareddy0329 <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>

* feat: Make DistributedConfig Extensible (#5039)

* feat: Make DistributedConfig Extensible

* pylint

* Include none types when creating config jsons for safer reference

* fix: update test to account for changes

* format

* Add integ test

* pylint

* prepare release v2.240.0

* update development version to v2.240.1.dev0

* Fix key error in _send_metrics() (#5068)

Co-authored-by: pintaoz <[email protected]>

* fix: Added check for the presence of model package group before creating one (#5063)

Co-authored-by: Keshav Chandak <[email protected]>

* Use sagemaker session's s3_resource in download_folder (#5064)

Co-authored-by: pintaoz <[email protected]>

* remove union

* fix merge artifact

* Change dir path to distributed_drivers

* update paths

---------

Co-authored-by: ci <ci>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>

* Skip tests with deprecated instance type (#5077)

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.241.0

* update development version to v2.241.1.dev0

* pipeline definition function doc update (#5074)

Co-authored-by: Rohan Gujarathi <[email protected]>

* feat: add integ tests for training JumpStart models in private hub (#5076)

* feat: add integ tests for training JumpStart models in private hub

* fixed formatting

* remove unused imports

* fix unused imports

* fix unit test failure and fix bug around versioning

* fix formatting

* fix unit tests

* fix model_uri usage issue

* fix some formatting

* separate private hub setup code

* add try catch block

* fix flake8 issue so except clause is not bare

* black formatting

* fix: resolve infinite loop in _find_config on Windows systems (#4970)

* fix: resolve Windows path handling in _find_config

* Replace Path.match("/") with Path.anchor comparison
* Fix infinite loop in _studio.py path traversal

* test: Add tests for the new root path exploration

* Fix formatting style

* Fixed line to long

* Fix docstyle by running black manually

* Fix testcase with \\ when running on non-windows machines

* Fix formatting style

* cleanup unused import

* change: update image_uri_configs  03-11-2025 07:18:09 PST

* Fixing Pytorch training python version in tests (#5084)

* Fixing Pytorch training python version in tests

* Updating Inference test handling

* remove s3 output location requirement from hub class init (#5081)

* remove s3 output location requirement from hub class init

* fix integ test hub

* lint

* fix test

---------

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* fix: Prevent RunContext overlap between test_run tests (#5083)

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Torch upgrade (#5086)

* Fix Flake8 Violations

* UPDATE PYTORCH VERSION TO ADDRESS SECURITY RISK

**Description**

Currently used Pytorch version has a possible vulnerability .

Internal - https://tiny.amazon.com/p5i4jla1

**Testing Done**

Unit and Integration tests in the CodeBuild

* REvert CPU Versions

* Test Fix

* Codestyle fixes

* debug attempt

* Fixes

* Fix

* Fix

* prepare release v2.242.0

* update development version to v2.242.1.dev0

* add new regions to JUMPSTART_LAUNCHED_REGIONS (#5089)

Co-authored-by: isha chidrawar <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* ADD Documentation to ReadtheDocs for Upgrading torch versions (#5090)

* ADD Documentation to ReadtheDocs for Upgrading torch versions

**Description**

**Testing Done**
Only documentation updates

* Fix for Codestyle

* Remove unused import

* Flake8 Fix

* CodeStyle Fixes

* feature: Enabled update_endpoint through model_builder (#5085)

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: factor in set instance type when building JumpStart models in ModelBuilder. (#5093)

* Remove main function entrypoint in ModelBuilder dependency manager.

* Remove main function entrypoint in ModelBuilder dependency manager.

* fix: factor in set instance type when building JumpStart models in ModelBuilder.

* Remove default instance type from ModelBuilder.

* Restore default instance type. Tweak integ test.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* change: update image_uri_configs  03-21-2025 07:17:55 PST

* Skip tests failed due to deprecated instance type (#5097)

Co-authored-by: pintaoz <[email protected]>

* Feat: Added support for returing most recently created approved model package in a group (#5092)

Co-authored-by: Keshav Chandak <[email protected]>

* change: update image_uri_configs  03-25-2025 07:18:13 PST

* chore: fix integ tests to use latest version of model (#5104)

* change: update image_uri_configs  03-26-2025 07:18:16 PST

* Update Jinja version (#5101)

* Aligned disable_output_compression for @remote with Estimator (#5094)

* Update transformers version (#5102)

* fix: use temp file in unit tests (#5106)

* fix: fix flaky spark processor integ (#5109)

* fix: fix flaky spark processor integ

* format

* fix: fix flaky clarify model monitor test (#5107)

* chore: move jumpstart region definitions to json file (#5095)

* chore: move jumpstart region definitions to json file

* chore: address formatting issues

* fix: neo regions not ga in 5 regions

* chore: make variable private

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* change: Update for PT 2.5.1, SMP 2.8.0 (#5071)

* prepare release v2.243.0

* update development version to v2.243.1.dev0

* fix: flaky test (#5111)

* chore: fix semantic versioning for wildcard identifier (#5105)

* Add mlflow tracking arn telemetry (#5113)

Integ test failure is align with CI health

* Master (#5112)

* fix integ test hub

* lint

* fix jumpstart curated hub bugs

* lint

* fix tests

* linting

* lint

* rm test file

* fix test

* fix

* lint

* remove test

* update for test

* documentation: update ModelStep data dependency info (#5120)

Co-authored-by: Namrata Madan <[email protected]>

* Update instance gpu info (#5119)

* fix: remove historical job_name caching which causes long job name (#5118)

* Fix issue #4856 by copying environment variables (#5115)

* Fix issue #4856 by copying environment variables

* Added handler for pipeline variable while creating process job (#5122)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* documentation: update pipelines step caching examples to include more steps (#5121)

Co-authored-by: Brock Wade <[email protected]>

* prepare release v2.243.1

* update development version to v2.243.2.dev0

* Fix deepdiff dependencies (#5128)

* Fix deepdiff dependencies

* trigger tests

* Fix: fix the issue due to PR changes, 5122 (#5124)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: tgi image uri unit tests (#5127)

* fix: tgi image uri unit tests

* fix: black-format and flake8 failures

* fix: parse

* fix: print statement

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.243.2

* update development version to v2.243.3.dev0

* change: update image_uri_configs 04-11-2025 07:18:19 PST

* change: update image_uri_configs 04-15-2025 07:18:10 PST

* change: update image_uri_configs 04-16-2025 07:18:18 PST

* update pr test to deprecate py38 and add py312 (#5133)

* Py312 upgrade step 2: Update dependencies, integ tests and unit tests (#5123)

* clean up

* bump maxdepth for doc/api/training to fix readthedocs

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* change maxdepth for readthedocs rendering doc/api/training page

* Revert the PR changes 5122 (#5134)

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* documentation: Removed a line about python version requirements of training script which can misguide users.Training script can be of latest version based on the support provided by framework_version of the container

* feature: Enabled update_endpoint through model_builder

* fix: fix unit test, black-check, pylint errors

* fix: fix black-check, pylint errors

* fix:Added handler for pipeline variable while creating process job

* fix: Added handler for pipeline variable while creating process job

* Revert the PR changes: #5122, due to issue https://t.corp.amazon.com/P223568185/overview

* Fix: fix the issue, https://t.corp.amazon.com/P223568185/communication

* Revert PR 5122 changes, due to issues with other processor codeflows

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>

* update readme to reflect py312 upgrade

* prepare release v2.243.3

* update development version to v2.243.4.dev0

* chore: add huggingface images (#5142)

* Update ModelTrainer to support s3 uri and tar.gz file as source_dir (#5144)

* add s3 uri check to modeltrainer data source

* update ModelTrainer to support s3 uri and tar.gz file as source_dir

* black-format

* add unit and integ tests

* update logic and unit test to raise value error if the file is not .tar.gz

* feature:support custom workflow deployment in ModelBuilder using SMD image. (#5143)

* feature:support custom workflow deployment in ModelBuilder using SMD image. (#1661)

* feature:support custom workflow deployment in ModelBuilder using SMD inference image.

* Rename test case and pass session.

* Address PR comments.

* Tweak resource cleanup logic in integ test.

* Fixing CodeBuild integ test failures.

* Renamed integ test.

* Remove unused integ test, restore once GA.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Cache client as instance attribute in property@ decorator. (#1668)

* Remove property@ decorator from ABC definition.

* Cache client as instance attribute in @property.

* Fix flake8 issue.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Bugfixes from e2e testing. (#1670)

* Fix Alabtross Inference component tests

* trigger integ tests

---------

Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>

* fix: pin mamba version to 24.11.3-2 to avoid inconsistent test runs (#5149)

Co-authored-by: Namrata Madan <[email protected]>

* Add model server timeout (#5151)

Co-authored-by: adishaa <[email protected]>

* Add Owner ID check for bucket with path when prefix is provided (#5146)

* Fix Flake8 Violations

* Add Owner ID check for bucket with path when prefix is provided

**Description**

Previously we called the head_bucket call to ensure the owner ID check, but this doesnt take into consideration cases where the s3 path is provided through the prefix.

This change makes sure that director level permissions are supported.

**Testing Done**
Tested through unit tests, integ tests and manual testing through the installation file.

Yes

* Address PR comment

* Codestyle fixes

* Minor fix

* Codestyle fixes

* Fix Unit tests

* prepare release v2.244.0

* update development version to v2.244.1.dev0

* chore: Add tei 1.6.0 image (#5145)

* chore: add huggingface images

* chore: add tei 1.6 image

* chore: add tei 1.6.0 to tei mapping in tests

* build(deps): bump mlflow in /tests/data/serve_resources/mlflow/pytorch (#5098)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump mlflow (#5155)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 2.20.3.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v2.20.3)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-version: 2.20.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* build(deps): bump scikit-learn (#5156)

Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.3.2 to 1.5.1.
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](https://github.com/scikit-learn/scikit-learn/compare/1.3.2...1.5.1)

---
updated-dependencies:
- dependency-name: scikit-learn
  dependency-version: 1.5.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improve error logging and documentation for issue 4007 (#5153)

* Improve error logging and documentation for issue 4007

* Add hyperlink to RTDs

* fix: fix bad initialization script error message (#5152)

Co-authored-by: Namrata Madan <[email protected]>

* fix: pin test dependency (#5165)

* fix: Map llama models to correct script (#5159)

* fix: honor json serialization of HPs (#5164)

* fix: honor json serialization of HPs

* test

* fix

* chore: Allow omegaconf >=2.2,<3 (#5168)

* Fix type annotations (#5166)

* remove --strip-component for untar source tar.gz (#5163)

* remove --strip-component for untar source tar.gz

* update code.tar.gz in test

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* fix: parameter mismatch in update_endpoint (#5135)

* add AG v1.3 (#5171)

Co-authored-by: Ubuntu <[email protected]>

* Fix test_deploy_with_update_endpoint() (#5177)

Co-authored-by: pintaoz <[email protected]>

* huggingface-tei dlc image_uri (#5174)

Co-authored-by: pintaoz-aws <[email protected]>

* huggingface-neuronx dlc image_uri (#5172)

* huggingface-neuronx dlc image_uri

* huggingface-neuronx inference dlc

---------

Co-authored-by: pintaoz-aws <[email protected]>

* huggingface-llm-neuronx dlc (#5173)

Co-authored-by: pintaoz-aws <[email protected]>

* Fix test_huggingface_tei_uris() (#5178)

* Fix test_huggingface_tei_uris()

* Fix json

---------

Co-authored-by: pintaoz <[email protected]>

* Fix Flask-Limiter version (#5180)

* prepare release v2.244.1

* update development version to v2.244.2.dev0

* change: Improve defaults handling in ModelTrainer (#5170)

* Improve default handling

* format

* add tests & update docs

* fix docstyle

* fix input_data_config

* fix use input_data_config parameter in train as authoritative source

* fix tests

* format

* update checkpoint config

* docstyle

* make config creation backwards compatible

* format

* fix condition

* fix Compute and Networking config when attributes are None

* format

* fix

* format

* change: Add image configs and region config for TPE (ap-east-2) (#5167)

* add image configs and region config for TPE (ap-east-2)

* remove TPE from djl-neuronx

---------

Co-authored-by: isha chidrawar <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>

* change: update image_uri_configs 05-14-2025 07:18:16 PST

* change: update jumpstart region_config 05-15-2025 07:18:15 PST

* fix: clarify model monitor one time schedule bug (#5169)

* fix: include model channel for gated uncompressed models (#5181)

* prepare release v2.244.2

* update development version to v2.244.3.dev0

* change: update image_uri_configs 05-20-2025 07:18:17 PST

* feat: Correct mypy type checking through PEP 561 (#5027)

Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Molly He <[email protected]>

* change: merge method inputs with class inputs (#5183)

* fix: addWaiterTimeoutHandling (#4951)

* addWaiterTimeoutHandling

* codeStyleUpdate

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

* updateCodeStyle

---------

Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Ubuntu <[email protected]>

* MLFLow update for dependabot (#5187)

* MLFLow update for dependabot

* Update lower bound

* Unit test fixes

* prepare release v2.245.0

* update development version to v2.245.1.dev0

* feature: Triton v25.04 DLC (#5188)

Co-authored-by: Mohan Kishore <[email protected]>

* update estimator documentation regarding hyperparameters for source_dir (#5190)

* Update Attrs version to widen support (#5185)

* Update Attrs version to widen support

**Description**

https://github.com/aws/sagemaker-python-sdk/issues/5075

**Testing Done**
Running unit and integ tests

Unit and integ tests passing indicate that this version upgrade does not break anything

* Update version in conda_in_process.yml

* Update test requirements

* MLFlow update version

---
Tested by : Running unit and integ tests

* prepare release v2.246.0

* update development version to v2.246.1.dev0

* fix: Allow import failure for internal _hashlib module (#5192)

* fix: Allow import failure for _hashlib module

* Fix formatting

* Appease flake8

* Add ignore_patterns in ModelTrainer to ignore specific files/folders (#5194)

* Add ignore_patterns in ModelTrainer to ignore specific files/folders

* fix black format

* add unit test

* add default ignore_patterns, fix minor path issue when uploaded to s3

* minor change to fix unit test failure

* add new variables in default ignore_patterns

* fix indentation error in docstring for readthedocs

* Fix: Object of type ModelLifeCycle is not JSON serializable (#5197)

* Fix: Object of type ModelLifeCycle is not JSON serializable

* Fix unit test

* Fix integ tests

* Revert "Fix integ tests"

This reverts commit f6513fe430d7f7f13486239aaaf6983efde2e00f.

* Fix integration tests

---------

Co-authored-by: adishaa <[email protected]>

* change: update jumpstart region_config, update image_uri_configs 06-12-2025 07:18:12 PST

* feat: Add support for MetricDefinitions in ModelTrainer (#5202)

* feat: Add support for MetricDefinitions in ModelTrainer

* style fix

* Update model_trainer.py to generate the doc

* resolve unit test failed

* solve another unit test error

---------

Co-authored-by: Chad Chiang <[email protected]>

* prepare release v2.247.0

* update development version to v2.247.1.dev0

* change: update image_uri_configs 06-19-2025 07:18:34 PST

* prepare release v2.247.1

* update development version to v2.247.2.dev0

* change: relax protobuf to <6.32 (#5211)

* change: update image_uri_configs 06-26-2025 07:18:35 PST

* feature: integrate amtviz for visualization of tuning jobs (#5044)

* feature: integrate amtviz for visualization of tuning jobs

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserialzers (#5037)

* Move RecordSerializer and RecordDeserializer to sagemaker.serializers and sagemaker.deserializers

* fix codestyle

* fix test

---------

Co-authored-by: pintaoz <[email protected]>

* Add framework_version to all TensorFlowModel examples (#5038)

* Add framework_version to all TensorFlowModel examples

* update framework_version to x.x.x

---------

Co-authored-by: pintaoz <[email protected]>

* Fix hyperparameter strategy docs (#5045)

* fix: pass in inference_ami_version to model_based endpoint type (#5043)

* fix: pass in inference_ami_version to model_based endpoint type

* documentation: update contributing.md w/ venv instructions and pip install fixes

---------

Co-authored-by: Zhaoqi <[email protected]>

* Add warning about not supporting torch.nn.SyncBatchNorm (#5046)

* Add warning about not supporting

* update wording

---------

Co-authored-by: pintaoz <[email protected]>

* prepare release v2.239.2

* update development version to v2.239.3.dev0

* change: update image_uri_configs  02-19-2025 06:18:15 PST

* fix: codestyle, type hints, license, and docstrings

* documentation: add docstring for amtviz module

* fix: fix docstyle and flake8 errors

* fix: code reformat using black

---------

Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>

* change: update image_uri_configs 07-04-2025 07:18:27 PST

* Update TF DLC python version to py312 (#5231)

* Update TF DLC python version to py312

* catch integ version

* Bump SMD version to enable custom workflow deployment. (#5230)

* Bump SMD version to enable custom workflow deployment.

* Update SMD image uri UT.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* Adding Hyperpod feature to enable hyperpod telemetry

* Adding Hyperpod feature to enable hyperpod telemetry (#5235)

* Adding Hyperpod feature to enable hyperpod telemetry

* Adding Hyperpod feature to enable hyperpod telemetry

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>

* fix: sanitize git clone repo input url (#5234)

* build(deps): bump torch in /tests/data/modules/script_mode (#5189)

Bumps [torch](https://github.com/pytorch/pytorch) from 2.0.1+cpu to 2.7.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/commits/v2.7.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump mlflow in /tests/data/serve_resources/mlflow/xgboost (#5218)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.13.2 to 3.1.0.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mlflow/mlflow/compare/v2.13.2...v3.1.0)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-version: 3.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump protobuf from 4.25.5 to 4.25.8 in /requirements/extras (#5209)

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 4.25.5 to 4.25.8.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl)
- [Commits](https://github.com/protocolbuffers/protobuf/compare/v4.25.5...v4.25.8)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 4.25.8
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* build(deps): bump requests in /tests/data/serve_resources/mlflow/pytorch (#5200)

Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: parknate@ <[email protected]>

* prepare release v2.248.0

* update development version to v2.248.1.dev0

* Nova training support (#5238)

* feature: Added Amazon Nova training support for ModelTrainer and Estimator

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* prepare release v2.248.1

* update development version to v2.248.2.dev0

* change: When rootlessDocker is enabled, return a fixed SageMaker IP (#5236)

* change: When rootlessDocker is enabled, return a fixed SageMaker IP

* Add logging for docker info command failure

---------

Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* fix: add hard dependency on sagemaker-core pypi lib (#5241)

* change: update image_uri_configs 07-18-2025 07:18:28 PST

* change: update image_uri_configs 07-22-2025 07:18:25 PST

* Relax boto3 version requirement (#5245)

* prepare release v2.248.2

* update development version to v2.248.3.dev0

* change: update image_uri_configs 07-23-2025 07:18:25 PST

* Directly use customer-provided endpoint name for ModelBuilder deployment. (#5246)

* Directly use customer-provided endpoint name for deployment in ModelBuilder.

* Fix ModelBuilder UTs after removing unique_name_from_base import.

---------

Co-authored-by: Joseph Zhang <[email protected]>

* feature: AWS Batch for SageMaker Training jobs (#5249)

---------

Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: haoxinwa <[email protected]>
Co-authored-by: JennaZhao <[email protected]>
Co-authored-by: Jessica Zhu <[email protected]>
Co-authored-by: David Lindskog <[email protected]>

* prepare release v2.249.0

* update development version to v2.249.1.dev0

* Add more constraints to test requirements (#5254)

* Add constraint file to test requirements

* Add constraints

---------

Co-authored-by: pintaoz <[email protected]>

* feature: Add support for InstancePlacementConfig in Estimator for training jobs running on ultraserver capacity (#5259)

---------

Co-authored-by: Greg Katkov <[email protected]>

* prepare release v2.250.0

* update development version to v2.250.1.dev0

* feat: support pipeline versioning (#5248)

Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* add sleep for model deployment (#5260)

* fix: dockerfile stuck on interactive shell (#5261)

* GPT OSS Hotfix (#5263)

* changes for gpt_oss jobs support

* added unit tests

* fixing unit test

* prepare release v2.251.0

* update development version to v2.251.1.dev0

* chore: onboard tei 1.8.0 (#5265)

* chore: onboard tei 1.8.0

* chore: fix tei tests

* feature: Greenland support for SagemakerTraining jobs (#1737)

* feature: Greenland support for SagemakerTraining jobs

* Added docstrings, tests to show sample filters for list_jobs

* Fixed linting error

* Addressed comments on rev1

* Removed unused attributes: retry_config and role_arn

* Fixes identified during end to end test

* Removed unused import

* prepare release v2.251.1

* update development version to v2.251.2.dev0

* latest tgi (#5255)

* latest tgi

* add optimum-neuron tgi

---------

Co-authored-by: sage-maker <[email protected]>

* Feature/js mlops telemetry (#5268)

* removed log statement

* added telemetry for js and mlops

* added for js estimator

* fixed unit tests

---------

Co-authored-by: Mohamed Zeidan <[email protected]>

* feature: add eval custom lambda arn to hyperparameters (#5272)

* fix: add retryable option to emr step in SageMaker Pipelines (#5281)

* Add nova custom lambda in hyperparameter from estimator (#5282)

* Add nova custom lambda in hyperparameter from estimator

* Add nova custom lambda in hyperparameter from estimator

* feat: change S3 endpoint env name (#5264)

* fix: handle trial component status message longer than API supports (#5276)

* merge rba without the iso region changes (#5290)

* change: update image_uri_configs 08-28-2025 07:18:37 PST

* change: update image_uri_configs 09-03-2025 07:18:37 PST

* change: update image_uri_configs 09-05-2025 07:18:30 PST

* change: update jumpstart region_config 09-17-2025 07:18:39 PST

* Revert "change: update image_uri_configs 08-28-2025 07:18:37 PST"

This reverts commit 96ea39db00c36050cc5478bd13f14e8c5f9347db.

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>

* Remove tags field from greenland job submitter (#1738)

* Remove tags field from greenland job submitter

* Update tests

* allow is_production to be passed in

* Add response in message

---------

Co-authored-by: JieShen Ong <[email protected]>

* prepare release v2.252.0

* update development version to v2.252.1.dev0

* feature: add model_type hyperparameter support for Nova recipes (#5291)

Co-authored-by: xibei chen <[email protected]>

* Fix flaky integ test (#5294)

Co-authored-by: pintaoz <[email protected]>

* fix: djl regions fixes #5273 (#5277)

* test: adds unit test for djl lmi regions

* test: adds regions in which djl images do not exist

* fix: adds djl missing regions

* fix: linting

* docs: update contributing to add linting section

---------

Co-authored-by: pintaoz-aws <[email protected]>

* Adding default identity implementations to InferenceSpec (#5278)

Co-authored-by: pintaoz-aws <[email protected]>

* feature: Added condition to allow eval recipe. (#5298)

* feature: Added condition to allow eval recipe.

* change: renamed is_nova_recipe to is_nova_or_eval_recipe

* chore: domain support for eu-isoe-west-1 (#5292)

* Add numpy 2.0 support (#5199)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Fix for a failed slow test: numpy fix (#5304)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Revert "Merge branch 'master-greenland' into master"

This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing
changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38.

* prepare release v2.253.0

* update development version to v2.253.1.dev0

* add TEI 1.8.2 (#5305)

* add TEI 1.8.2

* add test

* [hf-tei] add image uri to utils (#5287)

* tei

* tests

---------

Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Molly He <[email protected]>

* Revert the change "Add Numpy 2.0 support" (#5307)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* Update instance type regex to also include hyphens (#5308)

* Revert "Merge branch 'master-greenland' into master" (#1747)

This reverts commit 0ffec8a63af96c35f10663cd60832a807c0f6e16, reversing
changes made to 6414203828e8c32bcae868b9ae18c172e8aedf38.

* prepare release v2.253.1

* update development version to v2.253.2.dev0

* [hf] HF Inference TGI  (#5302)

* image

* tests

---------

Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* [Hugging Face][Pytorch] Inference DLC 4.51.3 (#5271)

* new image

* Update src/sagemaker/image_uri_config/huggingface.json

removed missing CPU image

* add cpu back

---------

Co-authored-by: Molly He <[email protected]>

* add HF Optimum Neuron DLCs (#5309)

* add image

* inf on dlc

* neuron tgi dlcs

* fix test

---------

Co-authored-by: Zhaoqi <[email protected]>

* feat: Triton v25.09 DLC (#5314)

* Add Numpy 2.0 support (#5311)

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* Fix incompatible_dependecies test

* update tensorflow artifacts

* update tensorflow artifacts

* update tensorflow artifacts

* testfile codestyle fixes

* testfile codestyle fixes

* update SKLearn image URI config

* update SKLearn image URI config

* docstyle fixes

* docstyle fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fixes

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* numpy fix for slow test

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Revert 'Add numpy 2.0 support'

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

* Add numpy 2.0 support

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>

* prepare release v2.254.0

* update development version to v2.254.1.dev0

* [hf] HF PT Training DLCs (#5301)

* image

* add py312

* fix

* test fix

* typo

---------

Co-authored-by: Molly He <[email protected]>

* fix: update get_execution_role to directly return the ExecutionRoleArn if it presents in the resource metadata file (#5315)

Co-authored-by: Jun Lyu <[email protected]>

* Updating RegisterModel step with new params (#1766)

* Adding Model package registration field for registermodel step

* Adding base model for register model step

* Fixes for baseModel

* Fix unit tests

* Adding tests for model, fixing checkstyle

* Modifying BaseModel to ContainerBaseModel

* Fixes

* Fixing checkstyle

* fixing imports

* fix for unit test

* Fix unit tests

* Keynote3 kandinsky (#1833)

* feature: Add support for SFT recipes

- Add unit tests

* Rename model_type and recipe

* Feat: Add support for LLMFT in ModelTrainer and add unit tests

* Upload verl recipe to S3 along with llmft

* Update llmft check to reflect the new recipe structure

---------

Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: appari <[email protected]>

* zimmer merged to master-v2

* numpy fix

* fixed numpy version

* more merge conflict

* more merge conflicts

* estimator fix

* requirements conflict

* pyproj fix

* smcore<2.0.0

* Restrict sagemaker-core version to less than 2.0.0 (#1917)

* malav changes w smcore<2.0.0

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: IshaChid76 <[email protected]>
Co-authored-by: Isha Chidrawar <[email protected]>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: malavhs <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: rsareddy0329 <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Rohan Narayan <[email protected]>
Co-authored-by: varunmoris <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Julian Grimm <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: rrrkharse <[email protected]>
Co-authored-by: evakravi <[email protected]>
Co-authored-by: Victor Zhu <[email protected]>
Co-authored-by: ruiliann666 <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: jkasiraj <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Molly He <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Aditi Sharma <[email protected]>
Co-authored-by: adishaa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman A <[email protected]>
Co-authored-by: David Tippett <[email protected]>
Co-authored-by: Prateek M Desai <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: pagezyhf <[email protected]>
Co-authored-by: zicanl-amazon <[email protected]>
Co-authored-by: DemyCode <[email protected]>
Co-authored-by: haozhx23 <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Will Childs-Klein <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: uyoldas <[email protected]>
Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: Sirut Buasai <[email protected]>
Co-authored-by: Tritin Truong <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: papriwal <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: haoxinwa <[email protected]>
Co-authored-by: JennaZhao <[email protected]>
Co-authored-by: Jessica Zhu <[email protected]>
Co-authored-by: David Lindskog <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: adtian2 <[email protected]>
Co-authored-by: Kamalakannan Hari Krishna Moorthy <[email protected]>
Co-authored-by: Mohamed Zeidan <[email protected]>
Co-authored-by: Tim Tang <[email protected]>
Co-authored-by: Timothy Wu <[email protected]>
Co-authored-by: Cuong Vu <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Jie Shen Ong <[email protected]>
Co-authored-by: JieShen Ong <[email protected]>
Co-authored-by: sylvie7788 <[email protected]>
Co-authored-by: xibei chen <[email protected]>
Co-authored-by: Malte Reimann <[email protected]>
Co-authored-by: aviruthen <[email protected]>
Co-authored-by: chiragvp-aws <[email protected]>
Co-authored-by: Gokul A <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: Andrew Song <[email protected]>
Co-authored-by: JunLyu <[email protected]>
Co-authored-by: Jun Lyu <[email protected]>
Co-authored-by: Madhubalasri-B <[email protected]>
Co-authored-by: CHANG-NING TSAI <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: appari <[email protected]>

* changelog update (#1949)

Co-authored-by: Mohamed Zeidan <[email protected]>

* updated sagemaker-core, boto

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: parknate@ <[email protected]>
Co-authored-by: timkuo-amazon <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: IshaChid76 <[email protected]>
Co-authored-by: Isha Chidrawar <[email protected]>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: malavhs <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: rsareddy0329 <[email protected]>
Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Rohan Narayan <[email protected]>
Co-authored-by: varunmoris <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: Julian Grimm <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: rrrkharse <[email protected]>
Co-authored-by: evakravi <[email protected]>
Co-authored-by: Victor Zhu <[email protected]>
Co-authored-by: ruiliann666 <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: jkasiraj <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Molly He <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Aditi Sharma <[email protected]>
Co-authored-by: adishaa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Roman A <[email protected]>
Co-authored-by: David Tippett <[email protected]>
Co-authored-by: Prateek M Desai <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: pagezyhf <[email protected]>
Co-authored-by: zicanl-amazon <[email protected]>
Co-authored-by: DemyCode <[email protected]>
Co-authored-by: haozhx23 <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Mohan Kishore <[email protected]>
Co-authored-by: Will Childs-Klein <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: uyoldas <[email protected]>
Co-authored-by: Uemit Yoldas <[email protected]>
Co-authored-by: Sirut Buasai <[email protected]>
Co-authored-by: Tritin Truong <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: Jiali Xing <[email protected]>
Co-authored-by: papriwal <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: haoxinwa <[email protected]>
Co-authored-by: JennaZhao <[email protected]>
Co-authored-by: Jessica Zhu <[email protected]>
Co-authored-by: David Lindskog <[email protected]>
Co-authored-by: Greg Katkov <[email protected]>
Co-authored-by: adtian2 <[email protected]>
Co-authored-by: Kamalakannan Hari Krishna Moorthy <[email protected]>
Co-authored-by: Mohamed Zeidan <[email protected]>
Co-authored-by: Tim Tang <[email protected]>
Co-authored-by: Timothy Wu <[email protected]>
Co-authored-by: Cuong Vu <[email protected]>
Co-authored-by: Dana Benson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: Jie Shen Ong <[email protected]>
Co-authored-by: JieShen Ong <[email protected]>
Co-authored-by: sylvie7788 <[email protected]>
Co-authored-by: xibei chen <[email protected]>
Co-authored-by: Malte Reimann <[email protected]>
Co-authored-by: aviruthen <[email protected]>
Co-authored-by: chiragvp-aws <[email protected]>
Co-authored-by: Gokul A <[email protected]>
Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: Andrew Song <[email protected]>
Co-authored-by: JunLyu <[email protected]>
Co-authored-by: Jun Lyu <[email protected]>
Co-authored-by: Madhubalasri-B <[email protected]>
Co-authored-by: CHANG-NING TSAI <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: Ankita Agarwal <[email protected]>
Co-authored-by: appari <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants